Training Algorithms for Linear Text Classiiers
نویسندگان
چکیده
Systems for text retrieval, routing, categorization and other IR tasks rely heavily on linear classiiers. We propose that two machine learning algorithms, the Widrow-Hoo and EG algorithms, be used in training linear text classiiers. In contrast to most IR methods, theoretical analysis provides performance guarantees and guidance on parameter settings for these algorithms. Experimental data is presented showing Widrow-Hoo and EG to be more eeective than the widely used Rocchio algorithm on several categorization and routing tasks.
منابع مشابه
Bayesian Classifiers Are Large Margin Hyperplanes in a Hilbert Space
Bayesian algorithms for Neural Networks are known to produce classiiers which are very resistant to overrtting. It is often claimed that one of the main distinctive features of Bayesian Learning Algorithms is that they don't simply output one hypothesis, but rather an entire distribution of probability over an hypothesis set: the Bayes posterior. An alternative perspective is that they output a...
متن کاملText segmentation in mixed-mode images using classification trees and transform tree-structured vector quantization
Multimedia applications such as educational videos and color facsimile contain images that are rich in both tex-tual and continuous tone data. Because these two types of data have diierent properties, segmentation of the images into text and continuous tone data can improve compression by allowing diierent compression parameters or even algorithms to be employed on the diierent types. In this p...
متن کاملBayesian Classiiers Are Large Margin Hyperplanes in a Hilbert Space Produced as Part of the Esprit Working Group in Neural and Computational Learning Ii, Neurocolt2 27150
Bayesian algorithms for Neural Networks are known to produce clas-siiers which are very resistant to overrtting. It is often claimed that one of the main distinctive features of Bayesian Learning Algorithms is that they don't simply output one hypothesis, but rather an entire distribution of probability over an hypothesis set: the Bayes posterior. An alternative perspective is that they output ...
متن کاملModify the linear search formula in the BFGS method to achieve global convergence.
<span style="color: #333333; font-family: Calibri, sans-serif; font-size: 13.3333px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: justify; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; background-color: #ffffff; text-dec...
متن کاملA Sequential Algorithm for Training
The ability to cheaply train text classiiers is critical to their use in information retrieval, content analysis, natural language processing, and other tasks involving data which is partly or fully textual. An algorithm for sequential sampling during machine learning of statistical classiiers was developed and tested on a newswire text categorization task. This method, which we call uncertaint...
متن کامل